Rank | Count | Beginning |
---|---|---|
5658 | 166 | Nuo |
6594 | 132 | Po |
3183 | 128 | Istorija |
8241 | 114 | Tai |
8324 | 100 | Taip |
3611 | 94 | Jis |
8122 | 91 | Tačiau |
3022 | 81 | Iš |
1491 | 79 | Biografija |
9276 | 78 | Vėliau |
9772 | 66 | Yra |
3982 | 65 | Kai |
2028 | 61 | Dėl |
3523 | 60 | Ji |
6341 | 60 | Per |
4847 | 58 | Lietuvos |
3737 | 55 | Jo |
8909 | 54 | Tuo |
3542 | 45 | Jie |
1362 | 44 | Be |
7433 | 40 | Ši |
7691 | 39 | Šiuo |
5938 | 38 | Pagal |
5601 | 37 | Nors |
7625 | 37 | Šis |
2885 | 35 | Iki |
2844 | 34 | Į |
7579 | 31 | Šios |
1683 | 30 | Buvo |
5039 | 29 | Manoma, |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV